Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] Pass-through multithreaded_io flag in read_parquet #1484

Merged
merged 2 commits into from
Oct 11, 2023

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Oct 11, 2023

Passes the multithreaded_io=False flag through when running on the Ray Runner for read_parquet

@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Merging #1484 (fdb0849) into main (439f2bd) will increase coverage by 0.03%.
Report is 1 commits behind head on main.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1484      +/-   ##
==========================================
+ Coverage   74.86%   74.89%   +0.03%     
==========================================
  Files          60       60              
  Lines        6102     6102              
==========================================
+ Hits         4568     4570       +2     
+ Misses       1534     1532       -2     
Files Coverage Δ
daft/execution/execution_step.py 92.30% <ø> (ø)
daft/io/_parquet.py 100.00% <100.00%> (+5.26%) ⬆️
daft/table/table_io.py 96.52% <ø> (+0.69%) ⬆️

@jaychia jaychia merged commit a24e918 into main Oct 11, 2023
24 checks passed
@jaychia jaychia deleted the jay/parquet-multithreaded-io branch October 11, 2023 18:26
jaychia added a commit that referenced this pull request Oct 11, 2023
…thread (#1485)

Updates default max_connections value from 64 to 8

Also renames `max_connections` in internal APIs to
`max_connections_per_io_thread` to be more explicit, but keeps naming
for external-facing APIs for backwards compatibility

Note that the total number of connections being spawned for PyRunner is:
`8.min(Num CPUs) * max_connections`, and theses are shared throughout
the multithreaded backend

The total number of connections being spawned for RayRunner after #1484
is: `num_ray_workers * 1 (sine we run single-threaded) *
max_connections`

---------

Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant